神经辐射场(NERF)及其变体在代表3D场景和合成照片现实的小说视角方面取得了巨大成功。但是,它们通常基于针孔摄像头模型,并假设全焦点输入。这限制了它们的适用性,因为从现实世界中捕获的图像通常具有有限的场地(DOF)。为了减轻此问题,我们介绍了DOF-NERF,这是一种新型的神经渲染方法,可以处理浅的DOF输入并可以模拟DOF效应。特别是,它扩展了NERF,以模拟按照几何光学的原理模拟镜头的光圈。这样的物理保证允许DOF-NERF使用不同的焦点配置操作视图。 DOF-NERF受益于显式光圈建模,还可以通过调整虚拟光圈和焦点参数来直接操纵DOF效果。它是插件,可以插入基于NERF的框架中。关于合成和现实世界数据集的实验表明,DOF-NERF不仅在全焦点设置中与NERF相当,而且可以合成以浅DOF输入为条件的全焦点新型视图。还展示了DOF-nerf在DOF渲染上的有趣应用。源代码将在https://github.com/zijinwuzijin/dof-nerf上提供。
translated by 谷歌翻译
部分闭塞作用是一种现象,即相机附近的模糊物体是半透明的,导致部分外观被遮挡的背景。但是,由于现有的散景渲染方法,由于在全焦点图像中的遮挡区域缺少信息而模拟现实的部分遮挡效果是一项挑战。受到可学习的3D场景表示的启发,我们试图通过引入一种基于MPI的新型高分辨率Bokeh渲染框架来解决部分遮挡,称为MPIB。为此,我们首先介绍了如何将MPI表示形式应用于散布渲染的分析。基于此分析,我们提出了一个MPI表示模块与背景介入模块相结合,以实现高分辨率场景表示。然后,可以将此表示形式重复使用以根据控制参数呈现各种散景效应。为了训练和测试我们的模型,我们还为数据生成设计了基于射线追踪的散景生成器。对合成和现实世界图像的广泛实验验证了该框架的有效性和灵活性。
translated by 谷歌翻译
我们提出了Bokehme,这是一种混合散景渲染框架,将神经渲染器与经典的身体动机渲染器结合。鉴于单个图像和潜在的不完美差异图,Bokehme生成了具有可调节的模糊大小,焦平面和光圈形状的高分辨率照片现实玻璃效果。为此,我们分析了基于经典散射方法的误差,并得出了计算误差图的公式。基于此公式,我们通过基于散射的方法实现经典渲染器,并提出一个两阶段的神经渲染器,以从经典渲染器中修复错误的区域。神经渲染器采用动态多尺度方案来有效处理任意模糊大小,并经过训练以处理不完美的差异输入。实验表明,我们的方法与对合成图像数据和具有预测差异的真实图像数据的先前方法进行了比较。进一步进行用户研究以验证我们方法的优势。
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译